This document details the analyses and provides results for the primary analyses of the VIBRANT trial.
Loading the data
The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository (Still private at the moment, but will be made public upon publication, and anyone wishing to have access can email Laura Symul).
Loading the RDS file containing the unblinded VIBRANT MAE: /04 unblinded MAEs/mae_full_20250624.rds
Code
mae <-readRDS(mae_file)rm(mae_file)
Demographics (Table 1)
Summary table
The demographics table is built as follow.
We first create a table1_datadata.frame from the participant_crfs_merged table stored in the @metadata slot of the mae.
This table contains almost all variables needed to create the demographics table (i.e., site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method).
In addition, the mae@colData contains the participants’ arm and study population flags (here we keep the mITT population).
From the variables listed above, we create the variable food_insecurity using the three variables cut_size_meals, eat_less and hungry_did_not_eat: if a participant answered “Yes” to any of the corresponding questions, then she is considered to be affected by food insecurity.
table1_data |>ggplot(aes(x = food_insecurity)) +geom_bar() +labs(x="Food insecurity", y ="Count")+theme_classic()
In addition to these variables, we also include the participants’ partner’s gender.
Participants are asked about their partner’s gender at all weekly visits (1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, 2120) using CRF 19.
Answers to that question can be found in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table. Participants’ answer throughout the weekly visits will be summarized into one of these 4 categories:
No sex
Only male
Only female
Both
This variable is added to the table1_data table.
Code
gender_sexual_partner <-metadata(mae)$visits_crfs_merged |>select(pid, past_week_sex_partner) |>filter(!is.na(past_week_sex_partner)) |>group_by(pid) |>summarise(gender_sexual_partner =case_when(all(past_week_sex_partner =="No sex in the past week") ~"No sex",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &!any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Only man",any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) &!any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) ~"Only female",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Both",TRUE~NA_character_ ),.groups ="drop" )table1_data <- table1_data |>left_join(gender_sexual_partner, by ="pid")
We also re-code the race variable as follows:
“Asian or South Asian” is re-coded to “Asian” (for brevity)
“Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
“Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Note
@Disebo & Caroline: do we go for British or American English? (Colored vs Coloured)?
Code
table1_data <- table1_data |>mutate(across(where(is.character), as.factor),race =fct_recode( race,"Asian"="Asian or South Asian","Black/African"="Black, African American, or African","Black/African"="African","Coloured"="Coloured","White"="White","Other"="Other","Prefer not to answer"="Prefer not to answer" ),race =fct_relevel(race, "Asian", "Black", "Coloured", "White", "Other", "Prefer not to answer"),arm = arm |>fct_drop() ) |>set_variable_labels(food_insecurity ="Report food insecurity in past 12 months ",sexual_partners_past_month ="Number of partners in past month",sexual_partners_lifetime ="Number of partners in lifetime",ethn ="Ethnicity", contraception_method ="Contraception", gender_sexual_partner ="Gender of sexual partners" )
Warning
@Laura Vermeren: I changed the summary statistics from mean to median for the continuous variables (for number of past sexual partner, the mean was artificially high in the arm with the 90 partner MGH participant). Would you be able to check that the tests selected by the gt_summary package are appropriate for the data we have? They should be, but I’d love if we could double-check :)
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.
Primary outcomes (Table 2)
Colonization by week 5
The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).
table_2 |>group_by(site) |>gt(caption ="Table 1: Colonization by week 5 by arm and site", row_group_as_column =TRUE) |>cols_width("name"~px(200),everything() ~px(120) ) |>cols_label(name ="",# Blinded = "All blinded arms",Pl ="Placebo",`LC-106-7`="LC-106<br>7 days",`LC-106-3`="LC-106<br>3 days",`LC-106-o`="LC-106<br>overlap",`LC-115`="LC-115<br>7 days",.fn = md )
Table 1: Colonization by week 5 by arm and site
Placebo
LC-106 7 days
LC-106 3 days
LC-106 overlap
LC-115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
0 (0 %)
9 (64 %)
7 (50 %)
7 (50 %)
10 (71 %)
95% CI
0% - 23%
35% - 87%
23% - 77%
23% - 77%
42% - 92%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
1 (20 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
1% - 72%
42% - 100%
3% - 100%
3% - 100%
54% - 100%
Visualization of the primary outcome (and a few selected secondary outcomes)
Code
col_by_week5 |>arrange(site, arm, LBP_colonization_by_week5) |>group_by(site, arm) |>mutate(participant_number =row_number() |>factor()) |>ungroup() |>ggplot(aes(x = participant_number, y = site |>fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +geom_point(size =3) +facet_grid(. ~ arm, scales ="free_x", space ="free_x") +guides(color ="none", fill ="none") +scale_shape_manual("Colonized by LBP by week 5", values =c(1, 16)) +scale_color_manual(values = site_colors) +xlab("Participant number") +ylab("") +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, vjust =0.5) )
Code
col_by_week5_qPCR |>arrange(site, arm, LBP_colonization_by_week5) |>group_by(site, arm) |>mutate(participant_number =row_number() |>factor()) |>ungroup() |>ggplot(aes(x = participant_number, y = site |>fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +geom_point(size =3) +facet_grid(. ~ arm, scales ="free_x", space ="free_x") +guides(color ="none", fill ="none") +scale_shape_manual("Colonized by LBP by week 5", values =c(1, 16)) +scale_color_manual(values = site_colors) +xlab("Participant number") +ylab("") +ggtitle("Colonization based on the relative abundances estimated by qPCR") +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, vjust =0.5), plot.title =element_text(hjust =0.5) )
Code
detected_by_week5_qPCR |>arrange(site, arm, LBP_detected_by_week5) |>group_by(site, arm) |>mutate(participant_number =row_number() |>factor()) |>ungroup() |>ggplot(aes(x = participant_number, y = site |>fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +geom_point(size =3) +facet_grid(. ~ arm, scales ="free_x", space ="free_x") +guides(color ="none", fill ="none") +scale_shape_manual("Colonized by LBP by week 5", values =c(1, 16)) +scale_color_manual(values = site_colors) +xlab("Participant number") +ylab("") +ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, vjust =0.5), plot.title =element_text(hjust =0.5) )
Code
crisp_dom_by_5week |>arrange(site, arm, crisp_dom_by_5week) |>group_by(site, arm) |>mutate(participant_number =row_number() |>factor()) |>ungroup() |>ggplot(aes(x = participant_number, y = site |>fct_rev(), col = site, fill = site, shape = crisp_dom_by_5week)) +geom_point(size =3) +facet_grid(. ~ arm, scales ="free_x", space ="free_x") +guides(color ="none", fill ="none") +scale_shape_manual("Colonized by LBP by week 5", values =c(1, 16)) +scale_color_manual(values = site_colors) +xlab("Participant number") +ylab("") +ggtitle("L. crispatus dominance (≥ 50% by metagenomics)") +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, vjust =0.5), plot.title =element_text(hjust =0.5) )
Tests
We test for differences between the placebo arm and each active arm for the primary outcome using Fisher’s exact test (exhaustive permutation test).
Data from both site are merged for each arm given the low numbers of participants at MGH.
The \(p\)-values are adjusted for multiple testing (multiple active arms against the Placebo) using the Benjamini-Hochberg method.
g_tot_prop_LBP_strains +geom_label(aes(label = pid_nb), size =2) +labs(caption ="Participants are labeled by their enrollment order within each arm and site." )
Proportions of participants who colonized in each arm and each visit
Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. The alpha level (transparency) indicates whether the participant is part of the per-protocol (PP) or modified intention-to-treat (mITT) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGHmae[["mg"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["qPCR"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) |>select(.feature, .sample, rel_ab, exclude_sample)mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", .feature =="Lactobacillus_crispatus") |>select(.feature, .sample, visit_code, rel_ab, exclude_sample)
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |>select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |>select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>filter(PIPV, as.numeric(visit_code) >=1200) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC-115 strains, this is only participants of arm LC-115# For LC-106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC-115"~ (arm =="LC-115"), LBP =="LC-106 & LC-115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
Code
tmp |>ggplot() +aes(x = .feature, y = prop_colonized, col = site) +facet_grid(. ~ LBP + strain_origin, scales ="free", space ="free") +geom_errorbar(aes(ymin = CI_low, ymax = CI_high), linewidth =1, alpha =0.7,width =0.3, position =position_dodge(width =0.5) ) +geom_point(position =position_dodge(width =0.5), size =2) +scale_color_manual("Study site", values = site_colors) +xlab("LBP strain") +ylab("Proportion of participants colonized\namong those exposed") +theme(axis.text.x =element_text(angle =45, hjust =1) )
Code
tmp |>mutate(CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |>select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain
CAP
MGH
LC-106 & LC-115 (SA strains)
FF00018
7% (4/56) [3% - 17%]
20% (3/15) [7% - 45%]
FF00051
30% (17/56) [20% - 43%]
67% (10/15) [42% - 85%]
UC101
25% (14/56) [16% - 38%]
73% (11/15) [48% - 89%]
LC-106 & LC-115 (US strains)
C0022A1
25% (14/56) [16% - 38%]
47% (7/15) [25% - 70%]
C0059E1
61% (34/56) [48% - 72%]
87% (13/15) [62% - 96%]
C0175A1
39% (22/56) [28% - 52%]
67% (10/15) [42% - 85%]
LC-115 (SA strains)
122010
57% (8/14) [33% - 79%]
33% (2/6) [10% - 70%]
185329
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00004
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00064
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00072
7% (1/14) [1% - 31%]
50% (3/6) [19% - 81%]
UC119
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
LC-115 (US strains)
C0006A1
50% (7/14) [27% - 73%]
67% (4/6) [30% - 90%]
C0028A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0112A1
64% (9/14) [39% - 84%]
83% (5/6) [44% - 97%]
Excluding immediate post-product visit
Code
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |>select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |>select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>mutate(include_visit =case_when( (arm %in%c("LC-106-7", "LC-115")) &as.numeric(visit_code) >1200~TRUE, (arm %in%c("LC-106-3", "LC-106-o")) &as.numeric(visit_code) >=1200~TRUE,TRUE~FALSE ) ) |>filter(PIPV, include_visit) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC-115 strains, this is only participants of arm LC-115# For LC-106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC-115"~ (arm =="LC-115"), LBP =="LC-106 & LC-115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
Code
tmp |>ggplot() +aes(x = .feature, y = prop_colonized, col = site) +facet_grid(. ~ LBP + strain_origin, scales ="free", space ="free") +geom_errorbar(aes(ymin = CI_low, ymax = CI_high), linewidth =1, alpha =0.7,width =0.3, position =position_dodge(width =0.5) ) +geom_point(position =position_dodge(width =0.5), size =2) +scale_color_manual("Study site", values = site_colors) +xlab("LBP strain") +ylab("Proportion of participants colonized\namong those exposed") +theme(axis.text.x =element_text(angle =45, hjust =1) )
Code
tmp |>mutate(CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |>select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms)." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms).